Moment based estimation of stochastic Kronecker 

graph parameters 



Stochastic Kronecker graphs supply a parsimonious model for large 
sparse real world graphs. They can specify the distribution of a large 
random graph using only three or four parameters. Those parameters 
have however proved difficult to choose in specific applications. This ar- 
ticle looks at method of moments estimators that are computationally 
much simpler than maximum likelihood. The estimators are fast and in 
our examples, they typically yield Kronecker parameters with expected 
feature counts closer to a given graph than we get from KronFit. The 
improvement was especially prominent for the number of triangles in the 
graph. 

1 Introduction 

Stochastic Kronecker graphs were introduced by [6] as a method for simulating 
very large random graphs. Random synthetic graphs are used to test graph 
algorithms and to understand observed properties of graphs. By using simulated 
graphs, instead of real measured ones, it is possible to test algorithms on graphs 
larger or denser than presently observed ones. Simulated graphs also allow one 
to judge which features of a real graph are likely to hold in other similar graphs 
and which are idiosyncratic to the given data. 

Stochastic Kronecker graphs are able to serve these purposes through a 
model that has only three or four parameters. Parameter estimation poses 
unique challenges for those graphs. The main problem is that for a graph with 
N nodes, the likelihood has contributions from N\ permutations of the nodes 
[7]. In practice, many thousands or millions of randomly sampled permutations 
are used to estimate the likelihood. Even then it takes more than O(iV^) work 
to evaluate the likelihood contribution from one of the permutations. 
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Abstract 



In this paper wc present a method of moments strategy for parameter es- 
timation. While moment methods can be inefHcient compared to maximum 
likeUhood, statistical efficiency is of reduced importance for enormous samples 
and in settings whc;rc the dominant error is lack of fit. The method equates 
expected to observed counts for edges, triangles, hairpins (2-stars or wedges) 
and tripins (3-stars). The Kronecker model gives quite tractable formulas for 
those moments. 

The outline of this paper is as follows. Section 2 defines Kronecker graphs 
and introduces some notation. Section 3 derives the expected feature counts. 
Section 4 describes how to solve method of moment equations for the parameters 
of the Kronecker graph model. Section 5 presents some examples on fitting Kro- 
necker models to some real world graphs. We compare several moment based 
ways to estimate Kronecker graph parameters and find the most reliable results 
come from a criterion that sums squared relative errors between observed and 
expected features. We find that the fitted Kronecker models usually underesti- 
mate the number of triangles compared to the real graphs. While our parameter 
estimates underestimate triangle counts and some other features, we find that 
they provide much closer matches than some previously published parameters 
fit by KronFit. Section 6 fits parameters to graphs that were randomly gener- 
ated from the Kronecker model. We find that the estimated parameters closely 
track their generating values, with some small bias when a parameter is at the 
extreme range of valid values. Section 7 has our conclusions. 

The data for our examples is online at 

https : //dgleich . coin/gitweb/?p=kgmoineiits ; a=suimnary 
along with the code used to estimate Kronecker parameters. 

2 The Kronecker model 

Given a node set M of cardinality A'' > 1, and a matrix Pjj e [0,1] defined 
over i,j G A/", a random graph G*{P) is one where the edge [ij] exists with 
probability Pij and all N'^ edges exist or don't independently. The graph G* 
includes loops [m] and may possibly include both [ij] and [ji]. Wc snip these 
out by defining the random graph G{P) with edges [ij] only when i ^ j and 
[max(i, j), min(«, j)] € G* , using any non-random ordering of M. Both G* and 
G are in fact probability weighted ensembles of graphs, h\it for simplicity we 
describe them as single random graphs. We assume that P is a symmetric 
matrix and so the ordering of nodes does not affect the distribution. 

The description of P allows up to N{N — l)/2 parameters that affect the 
outcome. Much more parsimonious descriptions can be made by taking P to 
be the Kronecker product of two or more smaller matrices. Recall that the 
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Kronecker product of matrices X s R™x" and Y s 
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An extremely parsimonious stochastic Kronecker graph takes P to be the 

a 0" 



r-fold Kronecker product of 9 

p _ p(r) 
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, for a,b,c€ [0,1]. That is 

0e = eM. 



If the power r is known, then only three numbers need to be specified, and with 
them we can then simulate other graphs that are like the original. Perhaps 
surprisingly, stochastic Kronecker graphs imitate many, but of course not all, of 
the important features seen in large real world graphs. See for example [7]. 
We would like to pick parameters a,b,c G [0, 1] to match the properties seen 

in a real and large graph. Parameter matrices © = and ©* = ^ 

give rise to the same graph distribution. To force identifiability, we may assume 
that a> c. 



3 Moment formulas 

The Kronecker structure in P makes certain aspects of G very tractable. For 
example, the number E of edges in G can be shown to have expectation 

E{E) = ^{{a + 2b + cy -{a + cY). (1) 

Simply counting the edges in G gives us valuable information on the parameter 
vector (a, b, c) . Because E is a, sum of independent Bernoulli random variables 
we find that Var(.E) < E{E) and so the relative uncertainty ^yWax{E) /E{E) < 
E(i?)~^/^ will be small in a graph with a large number of expected edges. 

This section derives equation (1) and similar formulas for the expected num- 
ber of features of various types. The expected feature counts require sums over 
various sets of nodes. Section 3.1 records some summation formulas that sim- 
plify that task. Then Section 3.2 turns expected feature counts into sums and 
Section 3.3 shows how those sums simplify for stochastic Kronecker matrices. 

3.1 Summation formulas 

Let i,j,k,l € A/" for a finite index set Af. A plain summation sign ^ represents 
sums over all combinations of levels of all the subscripting indices used. The 
symbol Yl includes all levels of all indices, except for any combinations where 
two or more of those indices take the same value. In several places we find that 
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sums arc easier to do over all levels of all indices, while the desired sums are 
over unique levels. Here we record some formulas to translate the second type 
into the first. 

It is elementary that 

ij ij i 

and similarly 

E* fij!^ = E fij'^ - E (fin + fiji + fiij) + 2 E (3) 

ijk ijk ij i 

When there are four indices, we get 

^ ^ fijkl — ^ ^ fijkl ^ ^ (^fijki ~l~ fijkj ~l~ fijkk ~l~ fijik H" fijjk H" fiijk^ 
ijkl ijkl ijk 

~l~ ^ ^ ^2 (.fijjj /ijii "I" ~l~ fiiij^ ~l~ fijij ~l~ fijji ~^ fiijjj ^ ^ ^ /u 



(4) 



Equation (4) is more complicated than the others. It can be proved by defining 
9ijk = J2i fijkl ~ fijki — fijkj — fijkk, writing J2ijki fijkl ~ 'l2ijk9ijk and then 
applying (3). 

In some of our formulas below, the first index is singled out but the others are 
exchangeable. By this we mean that fijk = fikj, when there are three indices, 
while fijkl = fijik = fikjl = fiki] = fiijk = fiikj is the version for four indices. 

When indices after the first are exchangeable, then equation (3) simplifies 

to 

fijk — (^fijj + + 2 /jjj, (5) 

ijk ij i 

and equation (4) simplifies to 

^ ] fijkl 3 ^ ^ f/iijfc + fijjk^ ^ fijjj ^fujj ~^ '^fiiij^ ^ ^ ] fiiii- (6) 
ijkl ijk ij i 

When all indices ijk are exchangeable, so that fijk = fikj = fjik — fjkt = 
fkij = fkji, then equation (5) simplifies to 

^ ^ fijk ~ 3 ^ ^ /iy" + 2 ^ ^ /iij. (7) 
ijk ij i 



3.2 Expected feature counts for independent edges 

The graph features we describe are shown in Figure 1. In addition to edges, 
there are hairpins (2-stars) where two edges share a common node, tripins (3- 
stars) where three edges share a node, and triangles. The Kronecker model has 



4 



Edge Hairpin Tripin Triangle 




Figure 1: This figure illustrates some of the graph features that we can count, 
for use in moment based estimates of the parameters in the stochastic Kronecker 
graph. 



independent edges. Here we find the expected feature counts for any random 
graph where edge [ij] appears with probability and edges are independent. 

Recall that G* is a random graph with Pr([ij] G G*) = Pij (independently). 
Let it have incidence matrix A*. There may be loops A*^ = 1, and for i ^ j, 
Alj and A*^ are independently generated. The graph G is formed by deleting 
loops from G* and symmetrizing the incidence matrix via 



Aij 



i=j 
A*i i<j. 



The number of edges in G is E = (1/2) X^^jAjj. The expected number of 
edges satisfies 

2 E(£) = E Ay) = Pij = ^ -Y^Piu (8) 

ij ij i 

using E{A,j) = E{A*j). 

The number of hairpins in G is H = {l/2)J2ijk^iJ^ik- Dividing by two 
adjusts the sum for counting {[ij], [ik]} twice. The expected value of H satisfies 

ijk ijk ij ij i 

by letting fijk — PijPik, for which fijk = fikj, and applying equation (5). 

The number of triangles in G is A = {i-/Q)J2ijk^ij^ik^jk, because the 
sum counts each triangle 3! = 6 times. The expected value of each term is 
fijk = PijPikPjk which is Symmetric in its three arguments and so we may 
apply equation (7) to get 

6E(A) = P^jPikPjk - 3 PiiPij + 2 E P^- 

ijk ij i 
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The number of tripins in G is T = {i/&)'^ijj^i AijAikAii. The final three 
indices in fij^i = PijPikPu are exchangeable, and so equation (6) applies. Thus 

6E(T) = ^ PijPikPii - 3 ^ PiiPijPik - 3 ^ PijPik 

ijkl ijk ijk 

+ 2 E 4- + 5 E + 4 E - 6 E ^'i- 

ij ij ij i 

3.3 Simplifying the sums 

The sums in the expected counts simplify, because of the properties of the 
Kronecker graph. Let the node set be A/' = A^r = {0, 1, . . . , 2'" — 1}. For i G Af 
write i = J2l=i 2*~^*s for is G {0, 1}. Similarly let j, k, and I be described in 
terms of js, kg, Is G {0, 1} for s = 1, . . . , r. 

The matrix entry P^ = py ' may be written 



For r > 2, we simplify the expression by induction using a smaller version of 
the problem defined via P('^-i). Specifically, 

E^?^4^^ = E--EE--EE--Eri 0-0^3^= 

ijk ii ir ji jr ki k-r 5=1 

; E • • • E E • • • E E ■ • • E n ^isA^) E ©-v©-^^ 

L il Jt-I fei fer-1 S = l ' irjrkr 

= (E^?"^^4^"^^) E ^^rAK 

ijk irjrkr 



— f ^irjr^irkr ] ' 



where indices i,, and kg are summed over their full ranges, and the indices i 

ij ik 



j, k for Pj^ ^^Pik ^6 summed over the node set Afr-i = {0, . . . , 2*" ^ — 1}. 



(r) 

All of the sums of products of elements of P-^j ' listed in the previous section, 
with summation over all levels of each index, also reduce this way to r'th powers 
of their value for the case r = 1. 

For r = 1 we need to sum products of elements of P over i or over i,j or 
over i,j,k. These cases correspond to the first 2, 4, or 8 rows of Table 1 for 

® " (c d)' iiistance E{H) requires X^j,/; PijPik which we know to be the 
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Table 1: This table shows entries in the matrix G with various indexing patterns 
needed in the examples. Sums over i, ij, and ijk use, respectively, the first 2, 
4, and 8 rows of the table. 

r'th power of 

Qi^j^&i^kr =a^ + ba + b'^ +cb + ab + b'^ + bc + c^ (9) 

= {a + bf + {b + cf. 

The first expression (9) follows by summing over the 8 rows of Table 1. As a 
result 

J2p,^P,f,= {{a + bf + ib + cfy. 

ijk 

In the rest of this section, we record the other sums we need. First, the sums 
over one index variable take the form 

^f^7 = (a" + c'")^ (10) 

i 

where cases m = 1, 2, 3 are used in our expected feature counts. The sums over 
two index variables are 

Y^PI^PIj = (a'"(a" + 6")+c'"(6" + c"))''. 

ij 

The cases we need are for (m,n) € {(0, 1), (0, 2), (0, 3), (1, 1), (1, 2)}. 
Four sums over three indices are used. They are: 

J2PijPik = {{a + bf + {b + cfY 

ijk 

^ PijPik = {a^ + c^ + b{a^ + c2) + 62(a + c) + 2by 

ijk 

^ PijPikPjk = (a^ + c'' + 'db^ia + c))^ and 

ijk 

PiiPijPik = {a{a + bf + c{b + cfy. 

ijk 
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Finally, one sum over four indices is used: 

^ PijPikPii = {{a + bf + {b + cfy. 

ijkl 

3.4 Expected feature counts 

Now we can specialize the results of Section 3.2 to the Kronecker graph setting. 
Gathering together the previous developments, we find 

2E(£;) = {a + 2b + cY - {a + cY 

2E{H) = {{a + bf + (6 + cfY - 2{a{a + b)+ c(c + b)Y 

_(a2 + 262 + c2)'- + 2(a2 + c2)'' 
6E(A) = (a^ + 362(a + c) + - 3(a(a' + + c{b'' + ^)Y + 2{a' + e'Y 
6E(T) = ((a + bf + {b + cfY - 3(a(a + bf + c{b + cff 

- 3(a3 + c3 + bio" + c2) + b\a + c) + 2b^Y + 2(a' + 26^ + c^Y 
+ 5(a3 + c3 + 62(a + c)Y + 4(0^ + + b{a^ + c^))" - 6(a3 + ti'Y ■ 

In each formula, the terms from sums over fewer indices come after the ones 
from more indices. The later terms adjust for loops and double edges and other 
degenerate quantities. For large r, we expect that the first term should be most 
important. In particular if min(a, &, c) > then in all cases the first quantity 
raised to the power r is the largest one. For example the first term in E(i?) is 
(1 + 2b/ {a + c)y times as large as the second one, which subtracts out loops. 

The first term will dominate for large r unless 6 <C a + c. The relative 
magnitude of the second term is 

f _a+J^ \ = 2''^°S2((«+c)/(a+26+c)) ^ j^-a 

\a + 2b + cj 

where a = log2((a + 26 + c)/(a + c)). If a > 1/2 then dropping the second 
term in makes a smaller difference than the sampling uncertainty in E. 

This holds when the off diagonal clement of Q is not too small compared to the 
average of the diagonal elements: b > {^/2 — l)(a + c)/2. 

3.5 Illustrations 

Some special cases of the formulas are of interest. For example if 6 = then 
there arc no edges in G* apart from loops. As a result G has 2"^ isolated nodes. 
We find from the above that E{E) = E{H) = E(A) = E(r) = when b = 0. 

If instead a = c = then each node i G J\f with coordinates ii , . . . , v has 
a dual node i* which has coordinates i* ~ 1 — ig for s = 1, . . . ,r. The only 
possible edges in G are between nodes and their duals. There are N = 2"^ nodes 
each with probability 6"" of having an edge out to its dual. The formula above 
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gives E(£;) = {2hY/2 = N¥/2 when a = c = 0, as it should. We also get 
E(if) = E(A) = E(r) = when a = c = 0. 

Ifo = 6 = c= l, then G* has every possible edge and loop with probability 
1. As a result G is the complete graph on TV = 2*" nodes. Then it has N{N ~\) /2 
edges, N{N - l)(iV - 2)/2 hairpins, N{N - l)(iV - 2)/2 triangles, and it has 
N{N - 1){N - 2){N - 3)/6 tripins. 



4 Solving for a, 6, and c 

There are four equations in Section 3.4. To estimate a, b, and c will require at 
least three of them. Because they are high order polynomials it is possible that 
there are multiple solutions or even none at all. The latter circumstance would 
provide some evidence of lack of fit of the stochastic Kronecker model to a given 
graph. Regardless, each of the equations involves the count of a feature in the 
graph. 



4.1 Counting features in a graph 

Three of the features we use are easily obtainable from the degrees of the nodes. 
Let di = ^j^j^ Aij be the degree of node i in graph G. Then 

i 

H =^'^di{di-l), and 

i 

T=^Y.di{di-l){di-2) 

i 

give the number of edges, hairpins (or wedges), and tripins in terms of the 
degrees di. 

The number of triangles A is not a simple function of di. Algorithms to 
count triangles are considered in [11]. The time complexity can be as low as 
0(£;^/2), and sometimes even lower for approximate counting [5]. 

4.2 Objective functions 

A pragmatic way to choose a, b, and c is to solve 

where the sum is over three or four of the features F G {E, H, T, A} from 
Section 3.4 and the minimization is taken over < c < a < 1 and < & < 1. 
The terms in (11) are scaled by an approximate variance. A sharper expression 
would account for correlations among the features used. That should increase 
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statistical cfScicncy, but in large problems lack of fit to the Kroneckcr model is 
likely to be more important than inefficiency of estimates within it. 

Many real world networks may not have good fits in terms of these three 
Kroneckcr parameters. This is the case for most of the forthcoming experiments. 
The following more general objective can be more robust to these deviances: 

Here D is either of the two distance functions: 

Dsq{x,y) ^ (x - or £)abs(a;, y) = |a; - t/| 

and N is one of the normalizations: 

Np (F, E) = F, Np2 (F, E) = , Ne (F, E) = E, A^e^ {F, E) = E^ . 

Using Dgq and A^e makes it equal to the previous objective (11). 

In principle, either of the two distance functions can be combined with any of 
the four normalizations. We do not think it is reasonable to expect a quadratic 
denominator to be a suitable match for the absolute error. Therefore our inves- 
tigations exclude combination of Dabs with either Nf2 or . 

We will find in Section 5 below that robust results arise from the combination 
Dgq and Np^, for which (12) reduces to 
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a sum of squared relative errors. 

Because there are only three parameters, the criterion (12) can simply be 
evaluated over a grid inside {(a, 6, c) € [0, 1]"^ | a > c}. To be sure of taking a 
point within e of the minimizer takes work 0{e~^). An alternative is to employ 
a general nonlinear minimization procedure. The remainder of this section looks 
at a method to reduce that effort. 



4.3 Matching leading terms 

In a synthetic graph A'' = 2'" is known. When fitting to a real world graph a 
pragmatic choice is r = [log2(A^)] . The interpretation is that the random graph 
G* may have had isolated nodes that were then dropped when forming G, but 
we suppose that fewer than half of the nodes in G* have been dropped. 

If we consider just the lead terms, then we could get estimates d, b, and c 
by solving three of the equations: 



e = (2F)i/'^ 


= a + 


26 + 


c, 


h = {2Hf''- 


= 


f6)2 


+ ib + cf, 


5 = (6A)i/'' 




+ c'] 


1 + 36^ (a + c) and 


t = {ery/'- 


= {a- 


f6)3 


+ {b + cf. 
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The equations for e and h together can be solved to get 



X = a + b 



y = b + c 

where we have assumed that a > c. The transformed tripin count t matches 
+ and so it is redundant given e and h, if we are jiist using lead terms. 

We must either count triangles, or use higher order terms. 

Equation (14) may fail to have a meaningful solution. At a minimum we 

require < 2h and e > \/2/i — e^. These translate into 

/i < < 2h, 

which is equivalent to 

2H < 4£;2 < 2''+^H 
that in terms of node degrees is 

^rfi(d,-l)< ($]di)'<iV^d,K-l). (15) 

i i i 

The left hand inequality in (15) holds for any graph, but the right hand side 
need not. It holds when N~'^ Tlii'^i ~ ^ d = di. If the variance of 

the node degrees di is smaller than their mean, then equation (14) does not have 
real valued solutions. The degree distribution of a stochastic Kronecker graph 
has heavy tails [10]. Therefore in applications where that model is suitable 
equation (14) will give a reasonable solution. 

When di have a variance larger than their mean, then we can do a univariate 
grid search for b € [0, 1] using equation (14) to get a = x — b = a(b) and 
c = y — b = c{b). The choice of b can then be made as the minimizer of 
\a{bf + c{bf + 3b^{a{b) + c{b)) - S\. 

5 Examples 

In this section, we experiment with different techniques for fitting the parame- 
ters of the Kronecker model. These experiments involve 8 real world networks 
whose statistical properties are listed in the rows of the forthcoming tables la- 
beled "Source." 

The networks ca-GrQc, ca-HepTh. ca-HepPh are co-authorship networks 
from arXiv [9]. The nodes of the network represent authors, and there is an 
edge between two nodes when the authors jointly wrote a paper. Likewise, the 
hollywood-2009 network is a collaboration graph between actors and actresses 
in IMDB [1, 2]. Nodes are actresses or actors, and edges are collaborations on 
a movie, as evidenced by jointly appearing on the cast. These networks are 
naturally undirected and all edges are unweighted. 



■. + VW- 



(14) 
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Both as20000102 and as-Skittcr arc technological infrastructure networks [9]. 
Each node represents a router on the internet and edges represent a physical or 
virtual connection between the routers. Again, these networks are undirected 
and unweighted. 

The wikipedia-20051105 graph is a symmetrized link graph of the articles 
on Wikipedia generated from a data download on November 5th, 2005 [3]. The 
underlying network is directed, but in these experiments, we have converted it 
into an undirected network by dropping the direction of the edges. 

All of the previously described networks have distinctly skewed degree dis- 
tributions. That is, there arc a few authors, actors, routers, or articles with a 
large number of links, despite the overall network having a small average de- 
gree. The final network we study is usroads, a highway- level network from the 
National Highway Planning Network (http://www.fhwa.dot.gov/planning/ 
nhpn/), which does not have a highly skewed distribution. We include it as an 
example of a nearly planar network. It is also naturally undirected. 

In two of the experiments, we generate synthetic Kronecker networks. The 
algorithm to realize these networks is an explicit coin-flipping procedure instead 
of the more common ball-dropping method [8]. For each cell i,j in the 
upper triangular portion, we first determine the log of the probability of a non- 
zero value in that cell, then generate a random coin flip with that probability 
as heads and record an edge when the coin comes up heads. This procedure 
is scalable because the full matrix of probabilities is never formed. It is also 
easily parallelizable. Our implementation uses pthreads to exploit multi-core 
parallelism. It takes somewhat more work than the ball-dropping procedure, 
scaling as 0(r2^'") instead of 0{rm), where m is the number of balls dropped. 
Often TO « 2^+^, that is, 8 balls per vertex [4]. Each ball generates about one 
edge; see [4] for a more thorough analysis. Coin-flipping preserves the exact 
Kronecker distribution whereas ball-dropping is an approximation. 

The experiments with these networks investigate (i) the difference in results 
from the various choices of D and N in the objective (12); (ii) the fitted param- 
eters to the 8 real world networks; and (iii) the difference in fitted parameters 
when only using three of the four graph features. 

5.1 Objective functions 

The first study regards the choice of objective function. Of eight possible com- 
binations of distance and normalization, we considered two to be unreasonable 
a priori. Here we investigate the other six pairs. 

Table 2 shows the different parameters a, b, and c chosen by each objec- 
tive function, as well as the expected feature counts for those parameters for 
three graphs: a single realization of a Kronecker graph with a = 0.99,6 = 
0.48, c = 0.25, the collaboration network ca-GrQc, and the infractucture net- 
work as20000102. The rows labeled "Source" contain the actual feature counts 
in each network. The optimization algorithm to pick a,b,c uses the best ob- 
jective value from three procedures. First, it tries 50 random starting points 
for the fmincon function in Matlab R2010b, an active set algorithm. Then, it 
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performs a grid search procedure with 100 equaUy spaced points in each dimen- 
sion. Finally, it tries the leading term matching algorithm from Section 4.3, and 
considers those parameters. 

The results in the table show that the choice of objective function does 
not make a difference when the graph fits the Kronecker model. However, it 
can make a large difference when the graph does not exactly fit, as in the ca- 
GrQc and as20000102 networks. Both of the objectives _Dsq, A'e2 and -Dabsj^E 
produced distinctly different fits for these two networks, compared to the other 
objectives. These two fits seem to be primarily matching the number of triangles 
- almost to the exclusion of the other features. The other odd fit for the ca- 
GrQc graph comes from the Dgq, Ne objective. This fit appears to be matching 
the tripin count and ignoring other features, something that also seems to be 
true for the as20000102 graph. Among the remaining fits for ca-GrQc, there is 
little difference among the fitted parameters and estimated features. The results 
are a bit different for as20000102. The fits for £>sq, A'e and Dg^jMp are almost 
identical and show a good match to the tripin count, but a poor match to the 
remaining features. The fits for D^q, Np^ and Dabsi Np are similar and deciding 
which is better seems like a matter of preference. These observations held up 
under further experimentation, which we omit here in the interest of space. 

Based on these results, either of the objectives Dgq, Np^ or r'abs) appears 
to be a robust choice when the model does not fit exactly. Due to the continuity 
of the Dgq function, the rest of our fits in this manuscript uses the Dsq^,Np2 
variation. 

5.2 Pcirameters for real-world networks 

For the 8 networks previously described, we use the objective function (12) 
with Dgq, Np2 to fit the parameters a, b, c. The results, along with the expected 
feature counts for the fitted parameters, are presented in Table 3. We show the 
minimizer for the three different strategies to optimize the objective described in 
the previous section: a direct minimization procedure, the grid search procedure, 
and the leading term matching approach (Section 4.3). For each approach, the 
table also shows the time required for that algorithm and the value of the 
objective function at the minimizer. 

Leskovec et al. [8] provide the fitted parameters a, b, and c from their KronFit 
algorithm for the networks ca-GrQc, ca-HepTh, ca-HepPh, and as20000102. We 
include them in Table 3 for comparison. In all cases but one, the expected 
feature count using KronFit is farther from the observed feature count than the 
expectation under our moment based fits. Sometimes it is much farther. There 
was one exception. For the graph as20000102, KronFit gave a better estimate 
of the number of edges than our moment method gave. 

KronFit typically underestimates the feature counts. The effect is severe for 
triangles. Kronecker random graphs commonly have many fewer triangles than 
the real world graphs to which they are fit. Our moment based estimators find 
parameters leading to many more triangles than the KronFit parameters do. 

In fairness, we point out that our method is designed to match expected to 
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Table 2: For three graphs, the fitted Kronecker parameters a, b, c for variations 
on the objective function (12). Subsequent columns show counts for these pa- 
rameters; the row labeled Source shows the actual network feature values -Fobs- 
The other rows show E(F)/i^obs- The objective column shows the value of the 
objective function at the minimizer. 

Graph Kron. Parameters Graph / Expected Features Obj. 



Fit type 


a 


b 


c 


Verts. 


Edges 


Hairpins 


Tripins 


Tris. 






Stochastic Kronecker 
















Source 


0.99 


0.48 


0.25 


16384 


30830 


521676 


8659050 


854 








0.99,3 


0.476 


0.255 


16384 


1.00 


1.00 


1.000 


1.0010 


7.76 


10-1 




0.99,3 


0.476 


0.254 




1.00 


1.00 


1.001 


1.0000 


9.72 


io-'3 




0.993 


0.476 


0.255 




1.00 


1.00 


1.000 


1.0014 


7.80 


10-1 




0.993 


0.476 


0.254 




1.00 


1.00 


1.001 


1.0000 


9.71 


10-'' 


-Dabs 1 


0.993 


0.476 


0.253 




1.00 


1.00 


1.000 


1.0000 


4.19 


10-3 




0.993 


0.476 


0.253 




1.00 


1.00 


1.000 


1.0000 


4.17 


10"3 


Leading 


0.990 


0.479 


0.250 




1.00 


1.00 


1.006 


0.9835 






ca-GrQc 






















Source 








5242 


14484 


229867 


2482738 


48260 






-Dsq, A^E 


1.000 


0.221 


1.000 


8192 


3.52 


2.74 


1.028 


0.0666 


9.14 


10^ 




1.000 


0.733 


0.000 




4.30 


29.82 


355.084 


0.9052 


2.53 


10° 




1.000 


0.459 


0.312 




1.17 


0.99 


1.001 


0.0107 4.77 


10^ 


Ds<i,Np2 


1.000 


0.467 


0.279 




1.06 


0.92 


1.0,35 


0.0107 


9.89 


10-1 


-Dabs I 


1.000 


0.737 


0.000 




4.51 


32.38 


397.213 


1.0000 


2.75 


10° 




1.000 


0.469 


0.267 




1.00 


0.87 


1.000 


0.0103 


1.12 


10° 


Leading 


1.000 


0.488 


0.229 




1.00 


1.00 


1.405 


0.0131 






as20000102 




















Source 








6474 


12572 


2059364 


6.75-10** 


6584 








1.000 


0.722 


0.000 


8192 


4.42 


2.73 


0.997 


5.2222 


2.32 


10*5 


Dsq,Nj,2 


0.712 


0.947 


0.000 




10.13 


4.89 


0.840 


1.1082 


1.49 


10" 




1.000 


0.722 


0.000 




4.40 


2.71 


0.989 


5.1843 


6.39 


10** 


-Dsq,JVF2 


1.000 


0.632 


0.000 




1.63 


0.51 


0.101 


0.7029 


1.54 


10° 


-Dabs, 


0.676 


0.980 


0.000 




11.83 


5.98 


1.000 


1.0000 


1.75 


10° 


-Dabs, Nf 


1.000 


0.648 0.000 




1.95 


0.68 


0.152 


1.0000 


2.12 


10° 
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observed feature counts, while KronFit fits by maximum likelihood. Therefore 
the evaluation criterion is closer to the fitting criterion for us. But maximum 
likelihood ordinarily beats or matches the method of moments in large samples 
from parametric models; it's mismatching criteria arc more than compensated 
for by superior statistical efficiency. The explanation here may involve maximum 
likelihood being less robust to lack of fit of the Kronecker model, or it may be 
that KronFit is not finding the MLE. 

The results in Table 3 show small differences in the fits between the direct and 
grid algorithms, although the direct algorithm is much faster. The leading term 
matching algorithm, when it succeeds, generates similar Kronecker parameters, 
although with a distinctly worse objective value. The results from the KronFit 
algorithm differ and likely match the graph in another aspect. 

Lead term matching is tens of times faster than direct search and roughly 
1000 times faster than grid search. But even the grid search takes under a 
minute in our examples, so the speed savings from the lead term approach is 
of little benefit here. For the large graphs, the time to compute the network 
features dominates the time to fit the parameters, showing that this approach 
scales to large networks. 

Overall, the results indicate that the Kronecker models tend not to be a 
good fit to the data. The model appears to have a considerable difference in 
at least once of the graph features. Usually, it's the number of triangles, which 
differs by up to two orders of magnitude for many of the collaboration networks. 

5.3 Fitting partial sets of features 

The previous set of experiments illustrated that the Kronecker graphs may not 
simultaneously fit all four of the network features: edges, hairpins/ wedges, trip- 
ins, and triangles. In Table 4, we examine the change in fits when only using 
three of the four network features in the summation in the objective (12). We 
take the set of parameters with the smallest objective among all the procedures 
investigated in the previous section. The results show small changes to the pa- 
rameters and expected feature fits. Nonetheless, the minimizer remains mostly 
unchanged. 

Table 4 provides a kind of cross-validated feature estimation, showing the 
accuracy of a feature's estimate when it is not included in the fitting. Apart 
from the exception noted before (the edge counts for as20000102) our moment 
based estimates give closer matches to the source feature counts than KronFit 
provides, whether the moment being studied is part of the fitting process or not. 

We see some examples where leaving out one feature seems to improve the 
fitting of another. For instance, in three of the four graphs, leaving out the 
tripin count improved the match for triangles and conversely. 
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Tabic 3: The fitted Kronecker parameters for variations on the algorithm - 
direct, grid, leading, or KronFit [8] - to minimize of the objective function (12) 
Subsequent columns show the expected feature counts for these parameters; 
the row labeled Source shows the actual network features. The time column is 
either the time to compute the features on the original graph or the time for 
the algorithm to fit the parameters. 

Graph Kroii. Parameters Graph / Expected Features Time 



Fit type a 


b 


c 


Verts. 


Edges 


Hairpins 


Tripins 


Tris. Obj. 


(sec.) 


ca-GrQc 




















Source — 






5242 


14484 


229867 


2482738 


48260 




<0.05 


Direct 1.000 


0.467 0.279 


8192 


1.06 


0.92 


1.035 


0.0107 0.989 


1.0 


Grid 1.000 


0.470 


0.270 


,1 


1.03 


0.91 


1.060 


0.0108 


0.991 


48.5 


Leading 1.000 


0.488 


0.229 


II 


1.00 


1.00 


1.405 


0.0131 


1.138 


<0.05 


KronFit 0.999 


0.245 


0.691 


II 


0.84 


0.20 


0.029 


0.0012 


2.935 


— 


ca-HepPh 




















Source — 






12008 


118489 


15278011 


1.28- 10^ 


3358499 


— 


1.9 


Direct 1.000 


0.669 


0.101 


16384 


1.11 


0.82 


1.064 


0.0164 


1.015 


0.8 


Grid 1.000 


0.670 


0.100 


II 


1.12 


0.84 


1.091 


0.0167 


1.016 


48.6 


Leading 1.000 


0.708 


0.005 


II 


1.00 


1.00 


2.021 


0.0196 


2.004 


<0.05 


KronFit 0.999 


0.437 0.484 


II 


u.oy 


n 1 n 

U. lU 




U.UUUD 


o.iyo 




ca-HepTh 




















Source — 






9877 


25973 


299356 


2098335 


28339 




<0.05 


Direct 1.000 


0.401 


0.379 


16384 


1.06 


0.92 


1.035 


0.0112 


0.989 


0.8 


Grid 1.000 


0.400 


0.380 


II 


1.05 


0.90 




0.0109 


0.991 




Leading 1.000 


0.423 


0.325 


II 


1.00 


1.00 


1 444 


0.0140 


1.169 




KronFit 0.999 


0.271 


0.587 


II 


0.74 


0.25 


U.U / O 


0.0020 


2.936 




hollywood 




















Source — 






1139905 


56375711 


4.76-101" 


3.24-10i'^ 


4.92-10" 




2946.1 


Direct 1.000 


0.623 


0.186 


2097152 


1.13 


0.76 


1.070 


0.0029 


1.075 


1.0 


Grid 1.000 0.620 


0.200 


II 


1.21 


0.80 


1.055 


0.0030 


1.083 


48.6 


Leading 1.000 


0.662 


0.095 


II 


1.00 


1.00 


2.670 


0.0046 


3.779 


<0.05 


as20000102 




















Source — 






6474 


12572 


2059364 


6.75-10** 


6584 




<0.05 


Direct 1.000 


0.632 


0.000 


8192 


1.63 


0.51 


0.101 


0.7029 


1.541 


0.8 


Grid 1.000 


0.630 


0.000 


II 


1.60 


0.49 


0.096 


0.6717 


1.543 


48.7 


KronFit 0.987 0.571 


0.049 


II 


0.99 


0.17 


0.018 


0.1738 


2.655 




as-skitter 




















Source — 






1696415 


11095298 


1.60-101° 


9.66-10" 


28769868 




107.0 


Direct 1.000 


0.644 


0.000 


2097152 


1.61 


0.74 


0.239 


0.1384 


1.755 


0.7 


Grid 1.000 


0.640 


0.000 


II 


1.48 


0.65 


0.199 


0.1181 


1.776 


48.7 


wiki-2005 




















Source — 






1634989 


18540603 3.72-101° 


3.72-10" 


44667105 




378.9 


Direct 1.000 


0.674 


0.000 


2097152 


1.64 


0.79 


0.211 


0.2589 


1.629 


0.6 


Grid 1.000 


0.670 


0.000 


II 


1.53 


0.70 


0.179 


0.2246 


1.646 


48.5 


usroads 




















Source — 






126146 


161950 


292425 


115885 


4113 




<0.05 


Direct 1.000 


0.070 


1.000 


131072 


0.88 


1.04 


1.057 


0.1177 0.798 


1.0 


Grid 1.000 


0.070 


1.000 


II 


0.87 


1.03 


1.012 


0.1148 


0.800 


48.5 
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Table 4: The change in fitted parameters when the objective function (12) only 
considers three of the four features. The row labeled "-Tris" , for instance, gives 
the fitted parameters when triangles are not included in (11). Rows labeled 
"source" again contain the actual graph features, and the rows labeled "all" 
show the parameters fitted to all four features. 

Graph Kron. Parameters Graph / Expected Features Time 



Fit type 


a 


b 


c 


Verts. 


Edges 


Hairpins 


Tripins 


Tris. Obj. 


(sec.) 


ca-GrQc 






















Source 








5242 


14484 


229867 


2482738 


48260 


— 


<0.05 


All 


1.000 


0.467 


0.279 


8192 


1.06 


0.92 


1.035 


0.0107 


0.989 


54.8 


KronFit 


0.999 


0.245 


0.691 


II 


0.84 


0.20 


0.029 


0.0012 


2.935 




-Edges 


1.000 


0.458 0.317 


II 


1.19 


1.00 


1.007 


0.0108 


0.978 


53.9 


-Hairpins 


1.000 


0.469 0.267 


II 


1.00 


0.87 


1.007 


0.0103 


0.980 


53.9 


-Tripins 


1.000 


0.493 0.216 


II 


0.99 


1.02 


1.536 


0.0139 


0.973 


54.0 


-Tris 


1.000 


0.467 0.279 


II 


1.06 


0.92 


1.029 


0.0106 


0.011 


56.1 


ca-HepPh 




















Source 








12008 


118489 


15278011 


1.28-10^ 


3358499 




1.9 


All 


1.000 


0.669 


0.101 


16384 


1.11 


0.82 


1.064 


0.0164 


1.015 


54.1 


KronFit 


0.999 


0.437 


0.484 


II 


0.69 


0.10 


0.014 


0.0006 


3.196 




-Edges 


1.000 


0.650 


0.192 


tf 


1.49 


1.02 


1.006 


0.0201 


0.960 


57.2 


-Hairpins 


1.000 


0.670 


0.083 


If 


1.01 


0.75 


1.007 


0.0146 


0.971 


57.2 


-Tripins 


1.000 


0.709 


0.005 


If 


1.01 


1.02 


2.065 


0.0200 


0.961 


56.9 


-Tris 


1.000 


0.669 


0.099 


ff 


1.10 


0.82 


1.058 


0.0162 


0.047 


54.7 


ca-HepTh 




















Source 








9877 


25973 


299356 


2098335 


28339 




<0.05 


All 


1.000 


0.401 


0.379 


16384 


1.06 


0.92 


1.0,35 


0.0112 


0.989 


57.4 


KronFit 


0.999 


0.271 


0.587 


ff 


0.74 


0.25 


0.073 


0.0020 


2.936 




-Edges 


1.000 


0.391 


0.417 


ff 


1.19 


1.00 


1.006 


0.0114 


0.977 


56.5 


-Hairpins 


1.000 


0.404 


0.365 


ff 


1.00 


0.87 


1.008 


0.0108 


0.979 


57.1 


-Tripins 


1.000 


0.431 


0.308 


ff 


0.98 


1.03 


1.623 


0.0152 


0.971 


56.9 


-Tris 


1.000 


0.401 


0.379 


ff 


1.06 


0.92 


1.028 


0.0111 


0.011 


56.6 


as20000102 




















Source 








6474 


12572 


2059364 


6.75-10* 


6584 




<0.05 


All 


1.000 


0.632 


0.000 


8192 


1.63 


0.51 


0.101 


0.7029 


1.541 


56.7 


KronFit 


0.987 


0.571 


0.049 


II 


0.99 


0.17 


0.018 


0.1738 


2.655 




-Edges 


0.935 


0.720 


0.000 


II 


3.04 


1.12 


0.235 


1.0796 


0.608 


57.3 


-Hairpins 


1.000 


0.621 


0.000 


II 


1.44 


0.41 


0.077 


0.5526 


1.250 


58.5 


-Tripins 


1.000 


0.628 


0.000 


II 


1.56 


0.47 


0.091 


0.6400 


0.723 


56.7 


-Tris 


1.000 


0.618 


0.000 


II 


1.39 


0.39 


0.071 


0.5137 


1.392 


57.1 
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6 Synthetic examples 



The results from the previous section show that there can often by a large 
deviation in the expected moments of the best Kronecker fit. In this section, we 
investigate the accuracy of the fitting procedure when the graph is a realization 
of a stochastic Kronecker network. 

For four sets of Kronecker parameters: 

• (a,6,c) = (0.99,0.48,0.25), r = 14 

• (a, 6, c) = (1.0, 0.67, 0.08), r = 14 

• (a, 6, c) = (0.999, 0.271, 0.587), r = 14 

• (a,6,c) = (0.87,0.6,0.7), r = 14 

we generate 50 realizations of each Kronecker graph. For each realization, we 
compute a fit using the objective (12) with the choices Dsq,Np2 and using the 
combination of approaches from the previous section. Figure 2 shows distribu- 
tion of fitted parameters to these 50 samples. For all four sets of parameters, 
the fitted results closely match the true values, with fairly small variation. 

For these synthetic problems, we also study how the empirical and fitted 
features difi^er. Figure 3 shows the distribution of the relative difference between 
the expectation of the fitted Kronecker features and the actual feature of each 
realization. It also shows the difference between the original feature count and 
the feature count of a re-realization. In other words, generate a Kronecker graph, 
fit the parameters, and re-generate with the fitted parameters. The figures show 
that the fitted parameters closely match the realizations. A curious property is 
that the fitted triangle count is always smaller than the empirical count. The 
difference in the re-realization can be large, almost 20% in the case of tripins or 
triangles for the first set of Kronecker parameters. 

Our final study is the distributions of the graph features given the Kronecker 
parameters, the expected features of the fitted parameters, and the graph fea- 
tures of a re-realized Kronecker graph. The plots in Figure 4 show that these 
distributions are all quite similar. 

7 Conclusions 

We have presented formuals for expected feature counts in Kronecker graphs 
and used them to generate a method of moments fitting strategy. We found 
that summing squared relative feature count errors was robust and easy to 
optimize. For graphs generated by the Kronecker model, our parameter and 
feature estimates closely match those of the fitted graph. For real world graphs 
we often find that the fitted Kronecker model implies smaller feature counts 
(apart from edges) than are seen in the real graph. The moment estimators 
typically come closer to the counts than those from KronFit. 
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0^ 
0.8 



(a) Kronecker 0.99, 0.48, 0.25, r = 14 



a = 1^00050 
40 
30 
20 
10 







b = 0.670 
9 



50 
40 
30 
20 
10 




c = 0.080 





1 0.6 0.65 0.7 0.05 0.1 0.15 

(b) Kronecker 1.0,0.67,0.08, r = 14 

50 
40 
30 
20 
10 



b = 0.271 








c = 0.587 



0.2 0.25 0.3 0.55 0.6 0.65 

(c) Kronecker 0.999,0.271,0.587, r = 14 



a = 0.870 




50 
40 
30 
20 
10 




b = 0.600 




50 
40 
30 
20 
10 




c = 0.700 




0.85 0.9 0.55 0.6 0.65 0.65 0.7 0.75 

(d) Kronecker 0.87,0.6,0.7, r = 14 



Figure 2: Histograms of fitted parameters to 50 realizations of a Kronecker 
graphs with the parameters given in the caption (red hnes). 
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(a) Kronecker 0.99, 0.48, 0.25, r = 14 
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20 
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0.02 -0.05 

20 
10 



0.05 



triangles 




0.2 -0.05 



0.05 -0.05 



0.05 



(b) Kronecker 1.0,0.67,0.08, r = 14 



Figure 3: Histograms of the relative difference between the true graph feature 
and the fitted (black solid) or regenerated (blue dashed line) feature. The 
relative difference is (Fti-uo - -Ffit)/^true- 



edges wedges edges wedges 




6 8 10 600 800 1000 1 1.2 1.4 4 4.5 5 

X 1 o'' X 1 0^ X 1 0'* 

(a) Kronecker 0.99, 0.48, 0.25, r = 14 (b) Kronecker 1.00, 0.67, 0.08, r = 14 



Figure 4: Histograms of empirical features. The dashed blue line with no x 
marks shows the empirically measured features and the solid black line with 
circles shows the expected value of each feature given the fitted parameters. 
These lines are often on top of each other. The dashed red line with x marks 
shows the features of a regenerated graph. 
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